fix: fall back to CUDA events when CUPTI driver version < 13.0 by sha7doww · Pull Request #2818 · flashinfer-ai/flashinfer

sha7doww · 2026-03-19T07:51:54Z

Description

On systems where cupti-python >= 13 is installed but the CUDA driver is older than 13.0, cupti.activity_enable() raises NotSupportedError. Previously this call happened after the try block, so the exception was unhandled and bench_gpu_time_with_cupti crashed instead of falling back to CUDA events.

Root cause: The driver-support check was missing from the guarded import block.

Fix: Added a probe (activity_enable + activity_disable on RUNTIME) inside the existing try block so the NotSupportedError triggers the existing CUDA-event fallback path.

Related Issues

Fixes a crash when using bench_gpu_time(enable_cupti=True) on machines with CUDA driver < 13.0 but cupti-python >= 13 installed.

Pre-commit

pre-commit run --all-files passes (all 14 hooks green)

Tests

Added tests/utils/test_cupti_fallback.py — mocks cupti.activity_enable to raise an exception and verifies:
- bench_gpu_time_with_cupti does not crash
- Returns valid timing results via the CUDA-event fallback
- Emits the expected UserWarning about falling back

$ pytest tests/utils/test_cupti_fallback.py -v
PASSED

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved GPU benchmarking robustness by verifying both CUPTI version and runtime driver support before use; falls back to alternative CUDA benchmarking when CUPTI is unavailable or activity enabling fails, emitting a warning.
Tests
- Added a test validating CUPTI fallback behavior, ensuring timing results remain positive and a fallback warning is issued when CUPTI activity enabling errors.

Probe CUPTI activity tracing support inside the existing try block so that NotSupportedError on older drivers is caught by the existing fallback logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verify that bench_gpu_time_with_cupti gracefully falls back to CUDA events when cupti.activity_enable raises (e.g. CUDA driver < 13.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-19T07:52:11Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue in the bench_gpu_time_with_cupti utility, preventing crashes on systems with newer cupti-python versions but older CUDA drivers. By introducing an early driver compatibility check, the system now gracefully falls back to CUDA events for GPU timing, enhancing robustness and user experience without requiring specific driver updates.

Highlights

bench_gpu_time_with_cupti crash fix: Resolved a crash in bench_gpu_time_with_cupti that occurred when cupti-python >= 13 was installed but the CUDA driver was older than 13.0, due to cupti.activity_enable() being called outside the guarded try block.
Driver support probe: Implemented an early driver support probe within the try block of bench_gpu_time_with_cupti using cupti.activity_enable() and cupti.activity_disable(). This ensures NotSupportedError is caught, triggering the existing CUDA-event fallback.
New test for CUPTI fallback: Added tests/utils/test_cupti_fallback.py to specifically test and verify the graceful fallback to CUDA events when CUPTI activity enablement fails, ensuring the function does not crash and provides valid timing results.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-19T07:52:11Z

📝 Walkthrough

Walkthrough

The CUPTI availability check in the GPU benchmarking utility now requires both a cupti-python major version ≥13 and a successful runtime probe via cupti.activity_enable(...)/cupti.activity_disable(...); failures fall back to CUDA event/graph benchmarking. A new CUDA-gated test verifies graceful fallback when cupti.activity_enable raises an error.

Changes

Cohort / File(s)	Summary
CUPTI availability check `flashinfer/testing/utils.py`	`bench_gpu_time_with_cupti` now performs a runtime driver probe (`cupti.activity_enable` / `cupti.activity_disable`) after the version check and treats failures as lack of CUPTI support, falling back to CUDA event/graph timing.
CUPTI fallback test `tests/utils/test_cupti_fallback.py`	New CUDA-gated test `test_cupti_fallback_on_activity_enable_error` that mocks the `cupti` import and version, forces `activity_enable` to raise, captures warnings, and asserts fallback timings and a "Falling back" UserWarning.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

bkryu
cyx-6
kahyunnam
jimmyzho
nv-yunzheq

Poem

🐰 I poked at CUPTI with a curious snoop,
It coughed a shrug — I hopped to a loop.
Timings kept steady, not a single flop,
I bounced on CUDA and finished on top.
A whisker twitch, benchmarks won't stop.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main fix: implementing a fallback to CUDA events when CUPTI driver version is below 13.0, which matches the primary change in the changeset.
Description check	✅ Passed	The description comprehensively covers what the PR does, the root cause, the fix, related issues, pre-commit verification, and test additions following the repository template structure.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly fixes a crash when using bench_gpu_time_with_cupti on systems with an older CUDA driver but a newer cupti-python library. The fix involves probing for driver support within the try...except block to trigger the existing fallback mechanism. A comprehensive test case has been added to ensure the fallback to CUDA events works as expected when cupti.activity_enable raises an exception. The changes are logical and well-tested. I have one minor suggestion to improve code clarity in the new test file.

gemini-code-assist · 2026-03-19T07:53:05Z

tests/utils/test_cupti_fallback.py

+    fake_module = MagicMock()
+    fake_module.cupti = fake_cupti
+
+    real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__


This line is overly complex for getting a reference to the built-in __import__ function. In standard Python 3 environments, __import__ is directly available in the global scope. You can simplify this for better readability and maintainability.

Suggested change

real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__

real_import = __import__

coderabbitai

🧹 Nitpick comments (1)

tests/utils/test_cupti_fallback.py (1)

10-45: Prefer bench_gpu_time(..., enable_cupti=True) in this test.

Line 39 currently tests the helper directly; switching to the unified benchmarking entrypoint better matches repository test guidance while still exercising the same fallback behavior.

♻️ Suggested update

-from flashinfer.testing import bench_gpu_time_with_cupti
+from flashinfer.testing import bench_gpu_time
@@
-            times = bench_gpu_time_with_cupti(
+            times = bench_gpu_time(
                 fn=torch.matmul,
                 input_args=(a, b),
                 repeat_iters=5,
                 dry_run_iters=2,
                 cold_l2_cache=False,
+                enable_cupti=True,
             )

As per coding guidelines tests/**/*.py: Use flashinfer.testing.bench_gpu_time() for benchmarking kernels, preferring CUPTI timing with auto-fallback to CUDA events.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_cupti_fallback.py` around lines 10 - 45, The test currently
calls the internal helper bench_gpu_time_with_cupti; replace that call with the
public unified entrypoint bench_gpu_time(..., enable_cupti=True) so the test
exercises the same CUPTI fallback via the sanctioned API. Specifically, in
test_cupti_fallback_on_activity_enable_error swap the call to
bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5,
dry_run_iters=2, cold_l2_cache=False) for a call to
flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b),
repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or
import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the
same patches/mocks and warning capture around it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 10-45: The test currently calls the internal helper
bench_gpu_time_with_cupti; replace that call with the public unified entrypoint
bench_gpu_time(..., enable_cupti=True) so the test exercises the same CUPTI
fallback via the sanctioned API. Specifically, in
test_cupti_fallback_on_activity_enable_error swap the call to
bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5,
dry_run_iters=2, cold_l2_cache=False) for a call to
flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b),
repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or
import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the
same patches/mocks and warning capture around it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e79249fd-ec1c-45bd-8cbe-6aac24c0fd5d

📥 Commits

Reviewing files that changed from the base of the PR and between fc4e70f and 29e0f26.

📒 Files selected for processing (2)

flashinfer/testing/utils.py
tests/utils/test_cupti_fallback.py

- Use bench_gpu_time(enable_cupti=True) public API instead of bench_gpu_time_with_cupti directly (CodeRabbit suggestion) - Combine nested with statements (ruff SIM117) - Add docstrings to inner helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yzh119

Hi @bkryu can you help review?

Address Gemini review suggestion — no need for __builtins__ check in standard Python 3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

tests/utils/test_cupti_fallback.py (1)
13-14: This test does not have GPU architecture-specific requirements—it only requires CUDA to be available. Given that FlashInfer assumes CUDA is available in test environments, the skip decorator may be unnecessary. If retaining it for defensive robustness, align with the project's approach: either use architecture checks from flashinfer.utils (when architecture-specific support is actually required) or use get_compute_capability() to query device properties. However, for a general "CUDA required" check without architecture constraints, this simple guard is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_cupti_fallback.py` around lines 13 - 14, Remove the
pytest.skipif decorator on test_cupti_fallback_on_activity_enable_error because
the test only needs CUDA availability and the project assumes CUDA in CI; simply
rely on the existing environment assumption, or if you prefer a defensive check,
replace the decorator with a project helper such as
flashinfer.utils.get_compute_capability() or a
has_cuda()/get_compute_capability() call to gate the test instead of using
pytest.mark.skipif(not torch.cuda.is_available()).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 13-14: Remove the pytest.skipif decorator on
test_cupti_fallback_on_activity_enable_error because the test only needs CUDA
availability and the project assumes CUDA in CI; simply rely on the existing
environment assumption, or if you prefer a defensive check, replace the
decorator with a project helper such as
flashinfer.utils.get_compute_capability() or a
has_cuda()/get_compute_capability() call to gate the test instead of using
pytest.mark.skipif(not torch.cuda.is_available()).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0212a953-4539-44d0-bf20-7a3604e1c6e3

📥 Commits

Reviewing files that changed from the base of the PR and between 2cd7e5d and bb2ee4a.

📒 Files selected for processing (1)

tests/utils/test_cupti_fallback.py

sha7doww and others added 2 commits March 19, 2026 14:46

fix: fall back to CUDA events when CUPTI driver version < 13.0

6c302aa

Probe CUPTI activity tracing support inside the existing try block so that NotSupportedError on older drivers is caught by the existing fallback logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: add regression test for CUPTI driver version fallback

29e0f26

Verify that bench_gpu_time_with_cupti gracefully falls back to CUDA events when cupti.activity_enable raises (e.g. CUDA driver < 13.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sha7doww requested review from bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, nvmbreughe and yzh119 as code owners March 19, 2026 07:51

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

yzh119 reviewed Mar 19, 2026

View reviewed changes

style: simplify real_import to just __import__

bb2ee4a

Address Gemini review suggestion — no need for __builtins__ check in standard Python 3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

sha7doww mentioned this pull request Mar 19, 2026

Remove unnecessary cupti-python >= 13 version check #2819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to CUDA events when CUPTI driver version < 13.0#2818

fix: fall back to CUDA events when CUPTI driver version < 13.0#2818
sha7doww wants to merge 4 commits intoflashinfer-ai:mainfrom
sha7doww:fix/cupti-driver-version-fallback

sha7doww commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

yzh119 left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__
	real_import = __import__

Conversation

sha7doww commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Pre-commit

Tests

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sha7doww commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading